{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 常见距离计算实现"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 欧式距离\n",
    "\n",
    "两向量各对应元素的差值平方和，再根号\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "latex"
    }
   },
   "source": [
    "$$\n",
    "\\sqrt{\\sum_{i=1}^{n} (x_{i} - y_{i})^{2} } \n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.4142135623730951"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "A = np.array([1, 2, 3])\n",
    "B = np.array([1, 3, 4])\n",
    "\n",
    "dist = np.sqrt(np.sum((A - B)**2))\n",
    "dist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.4142135623730951"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from scipy.spatial import distance\n",
    "\n",
    "distance.euclidean(A, B)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 汉明距离 (hamming)\n",
    "\n",
    "两向量之间的汉明距离就是向量之间、各位上不同的元素的个数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2.0"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from scipy.spatial.distance import hamming\n",
    "\n",
    "# hamming 函数返回的是不同个数的百分比，直接乘以元素数量就是不同的个数\n",
    "hamming(A, B) * len(A)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 余弦距离（余弦相似度）\n",
    "\n",
    "numpy 公式：\n",
    "```python\n",
    "np.sum(A * B) / (np.sqrt(np.sum(A**2)) * (np.sqrt(np.sum(B**2))))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9810229431759452"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "A = np.array([1, 2, 3])\n",
    "B = np.array([1, 3, 3])\n",
    "\n",
    "np.sum(A * B) / (np.sqrt(np.sum(A**2)) * (np.sqrt(np.sum(B**2))))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9810229431759452"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from scipy.spatial import distance\n",
    "1 - distance.cosine(A, B)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1.        , 0.61237244, 0.57735027],\n",
       "       [0.61237244, 1.        , 0.23570226],\n",
       "       [0.57735027, 0.23570226, 1.        ]])"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.feature_extraction.text import CountVectorizer\n",
    "from sklearn.metrics.pairwise import cosine_similarity\n",
    "\n",
    "corpus = [\n",
    "    'This is an apple',\n",
    "    'The second is also an apple',\n",
    "    'This is banana',\n",
    "]\n",
    "\n",
    "vectorizer = CountVectorizer()\n",
    "vectors = vectorizer.fit_transform(corpus)\n",
    "cosine_similarity(vectors)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
